83 research outputs found

    AceView: a comprehensive cDNA-supported gene and transcripts annotation

    Get PDF
    BACKGROUND: Regions covering one percent of the genome, selected by ENCODE for extensive analysis, were annotated by the HAVANA/Gencode group with high quality transcripts, thus defining a benchmark. The ENCODE Genome Annotation Assessment Project (EGASP) competition aimed at reproducing Gencode and finding new genes. The organizers evaluated the protein predictions in depth. We present a complementary analysis of the mRNAs, including alternative transcript variants. RESULTS: We evaluate 25 gene tracks from the University of California Santa Cruz (UCSC) genome browser. We either distinguish or collapse the alternative splice variants, and compare the genomic coordinates of exons, introns and nucleotides. Whole mRNA models, seen as chains of introns, are sorted to find the best matching pairs, and compared so that each mRNA is used only once. At the mRNA level, AceView is by far the closest to Gencode: the vast majority of transcripts of the two methods, including alternative variants, are identical. At the protein level, however, due to a lack of experimental data, our predictions differ: Gencode annotates proteins in only 41% of the mRNAs whereas AceView does so in virtually all. We describe the driving principles of AceView, and how, by performing hand-supervised automatic annotation, we solve the combinatorial splicing problem and summarize all of GenBank, dbEST and RefSeq into a genome-wide non-redundant but comprehensive cDNA-supported transcriptome. AceView accuracy is now validated by Gencode. CONCLUSION: Relative to a consensus mRNA catalog constructed from all evidence-based annotations, Gencode and AceView have 81% and 84% sensitivity, and 74% and 73% specificity, respectively. This close agreement validates a richer view of the human transcriptome, with three to five times more transcripts than in UCSC Known Genes (sensitivity 28%), RefSeq (sensitivity 21%) or Ensembl (sensitivity 19%)

    Chiral-Yang-Mills theory, non commutative differential geometry, and the need for a Lie super-algebra

    Full text link
    In Yang-Mills theory, the charges of the left and right massless Fermions are independent of each other. We propose a new paradigm where we remove this freedom and densify the algebraic structure of Yang-Mills theory by integrating the scalar Higgs field into a new gauge-chiral 1-form which connects Fermions of opposite chiralities. Using the Bianchi identity, we prove that the corresponding covariant differential is associative if and only if we gauge a Lie-Kac super-algebra. In this model, spontaneous symmetry breakdown naturally occurs along an odd generator of the super-algebra and induces a representation of the Connes-Lott non commutative differential geometry of the 2-point finite space.Comment: 17 pages, no figur

    Indecomposable doubling for representations of the type I Lie superalgebras sl(m/n) and osp(2/2n)

    Full text link
    We establish that for the type I Lie superalgebras sl(m/n)sl(m/n) and osp(2/2n)osp(2/2n), each Kac module admits a 1 parameter family of indecomposable double extensions. The result follows from the explicit evaluation of the H1H^1 Lie superalgebra cohomology valued in the tensor product of the module and its dual.Comment: 14 pages, LaTeX. Minor corrections and clarifications added. Citation adde

    Construction of matryoshka nested indecomposable N-replications of Kac-modules of quasi-reductive Lie superalgebras, including the sl(m/n) and osp(2/2n) series

    Full text link
    We construct a new class of finite dimensional indecomposable representations of simple superalgebras which may explain, in a natural way, the existence of the heavier elementary particles. In type I Lie superalgebras sl(m/n) and osp(2/2n), one of the Dynkin weights labeling the finite dimensional irreducible representations is continuous. Taking the derivative, we show how to construct indecomposable representations recursively embedding N copies of the original irreducible representation, coupled by generalized Cabibbo angles, as observed among the three generations of leptons and quarks of the standard model. The construction is then generalized in the appendix to quasi-reductive Lie superalgebras.Comment: Revised version 2 with minor modifications. On the suggestion of the referee, we show that the construction does not apply to the psl(n/n) superalgebras. 15 pages, 32 references Revised version 3 no modification except reformatting the bibliography and adding do

    Transcriptome sequencing of the Microarray Quality Control (MAQC) RNA reference samples using next generation sequencing

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Transcriptome sequencing using next-generation sequencing platforms will soon be competing with DNA microarray technologies for global gene expression analysis. As a preliminary evaluation of these promising technologies, we performed deep sequencing of cDNA synthesized from the Microarray Quality Control (MAQC) reference RNA samples using Roche's 454 Genome Sequencer FLX.</p> <p>Results</p> <p>We generated more that 3.6 million sequence reads of average length 250 bp for the MAQC A and B samples and introduced a data analysis pipeline for translating cDNA read counts into gene expression levels. Using BLAST, 90% of the reads mapped to the human genome and 64% of the reads mapped to the RefSeq database of well annotated genes with e-values ≀ 10<sup>-20</sup>. We measured gene expression levels in the A and B samples by counting the numbers of reads that mapped to individual RefSeq genes in multiple sequencing runs to evaluate the MAQC quality metrics for reproducibility, sensitivity, specificity, and accuracy and compared the results with DNA microarrays and Quantitative RT-PCR (QRTPCR) from the MAQC studies. In addition, 88% of the reads were successfully aligned directly to the human genome using the AceView alignment programs with an average 90% sequence similarity to identify 137,899 unique exon junctions, including 22,193 new exon junctions not yet contained in the RefSeq database.</p> <p>Conclusion</p> <p>Using the MAQC metrics for evaluating the performance of gene expression platforms, the ExpressSeq results for gene expression levels showed excellent reproducibility, sensitivity, and specificity that improved systematically with increasing shotgun sequencing depth, and quantitative accuracy that was comparable to DNA microarrays and QRTPCR. In addition, a careful mapping of the reads to the genome using the AceView alignment programs shed new light on the complexity of the human transcriptome including the discovery of thousands of new splice variants.</p

    Large-scale identification and characterization of alternative splicing variants of human gene transcripts using 56 419 completely sequenced and manually annotated full-length cDNAs

    Get PDF
    We report the first genome-wide identification and characterization of alternative splicing in human gene transcripts based on analysis of the full-length cDNAs. Applying both manual and computational analyses for 56 419 completely sequenced and precisely annotated full-length cDNAs selected for the H-Invitational human transcriptome annotation meetings, we identified 6877 alternative splicing genes with 18 297 different alternative splicing variants. A total of 37 670 exons were involved in these alternative splicing events. The encoded protein sequences were affected in 6005 of the 6877 genes. Notably, alternative splicing affected protein motifs in 3015 genes, subcellular localizations in 2982 genes and transmembrane domains in 1348 genes. We also identified interesting patterns of alternative splicing, in which two distinct genes seemed to be bridged, nested or having overlapping protein coding sequences (CDSs) of different reading frames (multiple CDS). In these cases, completely unrelated proteins are encoded by a single locus. Genome-wide annotations of alternative splicing, relying on full-length cDNAs, should lay firm groundwork for exploring in detail the diversification of protein function, which is mediated by the fast expanding universe of alternative splicing variants
    • …
    corecore